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Abstract — This paper presents a significant modification to 
the Random Demodulator (RD) of Tropp et al. for sub-Nyquist 
sampling of frequency-sparse signals. The modification, termed 
constrained random demodulator, involves replacing the random 
waveform, essential to the operation of the RD, with a constrained 
random waveform that has limits on its switching rate because 
fast switching waveforms may be hard to generate cleanly. The 
result is a relaxation on the hardware requirements with a slight, 
but manageable, decrease in the recovery guarantees. The paper 
also establishes the importance of properly choosing the statistics 
of the constrained random waveform. If the power spectrum of 
the random waveform matches the distribution on the tones of 
the input signal (i.e., the distribution is proportional to the power 
spectrum), then recovery of the input signal tones is improved. 
The theoretical guarantees provided in the paper are validated 
through extensive numerical simulations and phase transition 
plots. 

Index Terms — Analog-to-digital conversion, compressive sens- 
ing, random demodulator, repetition code, restricted isometry 
property, run-length limited sequences, sub-Nyquist sampling 



I. Introduction 

MODERN signal processing relies on the sampling of 
analog signals for discrete-time processing. The stan- 
dard approach to sampling signals is based on the Shannon- 
Nyquist sampling theorem, which states that a bandlimited sig- 
nal can be faithfully reconstructed from its samples collected 
uniformly at the Nyquist rate. However, this standard approach 
to sampling can be unwieldy for signals with very large 
bandwidths due to the physical constraints on modern Analog- 
to-Digital Converter (ADC) technology. The rule of thumb 
in ADC technology is that a doubling of the sampling rate 
causes a 1 bit reduction in resolution [ 1 ] or, more explicitly, 
P = 2 B - f s where B is the effective number of bits (ENOB), a 
measure of resolution of an ADC, and f s is the sampling rate. 
This expression states that for a required sampling resolution, 
the sampling rate has a hard upper limit due to constraints 
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on the ADC technology, and vice versa. The constant P is 
dependent on the particular ADC architecture and has steadily 
increased over time as the technology has improved; the 
current state-of-the-art allows for sampling at 1 GHz with a 
resolution of approximately 10 ENOB El, P). Unfortunately, 
this increase tends to happen rather slowly compared to the 
advancement seen in other areas of technology, such as mi- 
croprocessor technology following Moore's law. In particular, 
applications such as spectrum sensing for cognitive radios push 
modern ADC technology to its limit. 

A. Random Demodulation for Sub-Nyquist Sampling 

Though Nyquist sampling is the standard approach to 
sampling, other schemes have been considered that require 
a lower sampling rate for analog-to-digital conversion. The 
key to the success of these schemes is leveraging additional 
prior information about the class of signals to be sampled 
(perhaps in addition to being bandlimited). One such class 
of signals corresponds to complex-valued signals comprising 
a relatively small number of tones (S) in a very large (two- 
sided) bandwidth (W): S <C W. We say these signals have 
sparse spectral content. This class of signals is of significant 
interest in applications such as spectrum sensing, and is the 
one we will concentrate on for the rest of this paper. We refer 
the reader to Section |TT] for a mathematically precise definition 
of this signal class. Two good architectures to sample such 
signals are the Non-Uniform Sampler (NUS) (4), (5j and the 
Random Demodulator (RD) [6]. In this paper we concentrate 
exclusively on the RD because it offers a much more general 
framework for sub-Nyquist sampling. The block diagram of 
the RD architecture is presented in Fig. [T] and will be reviewed 
in more detail lateiQ The major results for the RD can be 
summarized as follows |6, Theorems 1 and 2]: let C be a 
positive, universal constant and let W be the Nyquist rate. 
The constituent tones of signals sampled by the RD can be 
recovered with high probability if the sampling rate R scales 
as (i) R > C[SlogW + log 3 W] for signals composed of 
S randomly located tone^] and (ii) R > CS\og 6 W for 
signals composed of arbitrary S tones. Contrast these results 
to the Shannon-Nyquist sampling theorem, which guarantees 
recovery of the original signal from its samples if R > W. 

1 While we focus exclusively on a single-channel system, the analysis can 
be easily extended to the multi-channel setting. 

2 It is worth noting here that the NUS is shown to have similar results 1 5 
Theorem 1.3]. 
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Fig. 1. Block diagram of the random demodulator |6|: The input signal is 
multiplied by a waveform generated from a Rademacher chipping sequence, 
then low-pass filtered, and finally sampled at a sub-Nyquist rate R <C W. 



A building block of the RD is a white noise-like, bipolar 
modulating waveform p m (t) (see Fig. [T]). This waveform 
switches polarity at the Nyquist rate of the input signal. 
An implicit assumption is that this waveform, in the analog 
domain, is made up of perfect square pulses with amplitude 
either +1 or —1. Hardware constraints, however, mean that 
a real waveform cannot switch polarity instantaneously and 
will encounter shape distortion. A non-zero time r is required 
to switch polarity and is dictated by the circuits encountered 
in ADC architectures (7] Ch. 4]. The transitions therefore 
occur over this time-scale, and the square waveform can be 
viewed as passing through a low-pass filter with a bandwidth 
proportional to 1/r. One implication is a reduction of the 
energy captured in the measurements that depends on r and 
the number of transitions in the waveform. For a larger r, or 
for more transitions in the waveform, less energy is captured 
in the measurements. 

Over 30 years ago a similar problem affected the peak 
detection of binary signals written on magnetic media. In mag- 
netic recording, data is recovered by passing a read head over 
the media; a higher recording density means there is greater 
interference between the read-back voltages of adjacent bits. 
To reduce distortion in the read back voltages, Tang and Bahl 
introduced Run-Length Limited (RLL) sequences |8|. Run- 
length constraints specify the minimum separation d and the 
maximum separation k between transitions from one symbol 
to another (say +1 to —1). Tang and Bahl proposed using 
these RLL sequences to increase the number of bits written 
on the magnetic medium by a factor of d + 1 without affecting 
the read-back fidelity. Note that RLL sequences, compared 
to unconstrained sequences, require a longer length to store 
the same amount of information. Tang and Bahl nonetheless 
observed that for certain RLL sequences the fractional increase 
in length is smaller than d + 1, leading to a net increase in 
recording density because the allowed closer spacing of the 
physical bits (on the medium) overcomes the increase in bit- 
sequence length. The reader may refer to (9) for further details 
and a nice overview on this topic. 

B. Our Contributions: Constrained Random Demodulation 

In this paper, we make two major contributions to the area of 
sub-Nyquist sampling for signals with sparse spectral content. 
Our first contribution is to apply the lessons learned from 
magnetic recording to the RD. Specifically, we replace the 
modulating waveform of the RD with a (d, k) -constrained 



waveform generated from an RLL sequence (see Fig. [2]). We 
refer to such a sampling system as a Constrained Random 
Demodulator (CRD). The use of an RLL sequence reduces 
the average number of transitions in the waveform by a 
factor of d + 1, which results in an increase in the signal 
energy captured by the hardware. From another viewpoint, 
if we fix the acceptable energy loss (or average number of 
transitions in the waveform), then using an RLL sequence 
allows a larger input signal bandwidth. We do, of course, pay 
a price: an RLL sequence introduces statistical dependence 
across the waveform. Our first major contribution is therefore 
establishing that the CRD still enjoys some theoretical guar- 
antees for certain choices of waveform. In fact, we explicitly 
show that the power spectrum of the waveform is the key 
to understanding these guarantees and, hence, to choosing 
the best RLL sequence. Further, we outline a tradeoff in 
acquirable bandwidth versus sparsity of the input signal and 
show through numerical simulations that a 20% increase in 
the bandwidth can be handled by the CRD with a negligible 
decrease in average performance. Our work here builds upon 
our preliminary work in (TO), (TTJ that was primarily limited 
to introducing the idea of the CRD along with Theorem [T] 
(without proof). 

Remark 1. Heuristically, the theoretical guarantees in this 
paper rely on two things: (i) each (active) tone leaves an 
identifiable signature that can be extracted from the measure- 
ments and (ii) the measurements capture a significant amount 
of energy of each tone. We will show that the identifiability 
depends on the modulating sequence power spectrum. Once 
this is established, we would further like to maximize the 
captured energy. Since an RLL waveform leads to an increase 
in the captured energy because of the switching constraints 
previously discussed, its use in a hardware implementation 
will lead to improved performance as long as it satisfies the 
identifiability criterion. 

Our second contribution is laying down the foundations of a 
concept that we call Knowledge-Enhanced Compressive Sens- 
ing (KECoM) for sub-Nyquist sampling, which we prelimi- 
narily explored in (TTJ with limited numerical experiments. 
In the context of the CRD, the principle of KECoM assumes 
that some tones in the input signal are statistically more likely 
to appear than others. An immediate application of this is a 
spectrum sensing problem where some regions of the spectrum 
are assigned a higher detection priority than others, but none 
are deemed uninformative. We show through numerical simu- 
lations that the distribution of the tones in the input signal has 
a profound effect on the reconstruction of input signals from 
samples collected using a CRD. Specifically, we show with 
phase transition plots |T2) that if the prior distribution over the 
tones matches the power spectrum of the RLL sequence used 
by the CRD, then the reconstruction performance improves 
when compared to a uniform distribution over the tones. Note 
that (13), (T4| have also recently explored ideas along similar 
lines, albeit for a different class of sequences. In contrast to 
(13), (T4| , we provide a theoretical analysis and, additionally, 
a comprehensive numerical analysis of RLL sequences in the 
RD by examining the phase transition plots. 
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(a) An unconstrained sequence 
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(b) An RLL sequence with (d,k) = (1, 4) 

Fig. 2. Comparing an RLL sequence to an unconstrained sequence: The 
unconstrained sequence can switch from one level, to the other, and back 
without limitation. The RLL sequence, on the other hand, remains at a level 
for at least d + 1, and at most k + 1, time instants after it changes levels 
and cannot switch back right away. The (analog) modulating waveform is 
generated using a shifted square pulse with the appropriate sign from the 
sequence. 



C Other Sub-Nyquist Sampling Schemes 

The work of Rife and Boorstyn (T5) in the mid-70 's is an 
early example of a successful sub-Nyquist sampling scheme. 
Their goal was to take samples of a sinusoid at a sub-Nyquist 
rate and then perform parameter estimation to determine the 
amplitude, frequency, and phase of a single, unknown tone. 
They also extended their work to the case of multiple tones 
in a large bandwidth [16]. Their work, however, becomes 
intractable when considering more than a couple tones. This 
is an early example of what has become known as compressed 
sensing of sparse signals. Compressed Sensing (CS) is the 
systematic exploration of sparsity as a prior model for input 
signals and recovery of these signals from a small number 
of linear measurements (17). It has produced many analytical 
tools and algorithms for signal recovery. In addition to the RD, 
several other sub-Nyquist sampling architectures have taken 
advantage of ideas from CS including Chirp Sampling fT8) 
and Xampling [79). 

While the RD considers a bandlimited input signal model 
with few active tones, several other classes of signals have 
been considered in the literature with the goal of finding more 
efficient sampling methods. One such class contains signals 



with so-called "Finite Rates of Innovation" (20). Signals 
belonging to this class can be described by a finite number 
of degrees of freedom in a given time interval, and it has 
been shown that they can be reconstructed from a small 
number of samples that is proportional to the degrees of 
freedom in that time interval. Another class constitutes signals 
in "shift-invariant subspaces." These signals are composed of 
a superposition of shifted 'generator' functions (e.g., splines 
or wavelets); see |2T| for a nice overview of this signal class. 
In (22) and (23), this signal model is shown to provide an 
alternative to the bandlimited signal model; in particular, it 
allows the reconstruction of signals belonging to Sobolev 
spaces with an approximation error that scales polynomially 
with the sampling period. 

One possible drawback to utilizing the RD for sampling is 
its assumed discrete-frequency signal model (cf. Section [II]). 
Specifically, the RD assumes that the input signal can be 
described by a discrete set of integral frequencies, while 
real-world signals are likely to contain tones off this grid. 
While this signal model might not entirely describe real- world 
signals, the effectiveness of the RD architecture has been 
successfully demonstrated in the lab (24), (25) . To address 
signals with tones not conformant to the integral-frequency 
assumption, we consider energy leakage in the frequency 
domain. A tone that does not fall exactly on the assumed 
frequency grid will leak energy across several tones due to 
the inherent windowing. The result is that a signal which is S- 
sparse in the analog domain becomes (aS) -sparse after being 
sampled, where a > 1. Other schemes, such as Xampling 
(T9) , offer an alternative approach assuming a different signal 
model; the pros and cons of both systems are examined in 
(26) . While our focus in this paper is exclusively on the RD, 
we believe that our contributions could have implications for 
other sub-Nyquist architectures. Specifically, the Xampling 
architecture uses modulating sequences similar to the ones 
used in the RD/CRD, and we believe that RLL sequences 
could benefit the Xampling architecture as well. A detailed 
analysis is, however, beyond the scope of this paper. 

We would also like to point to a possible utility of RLL 
sequences in the NUS. The implementation described in (4) 
requires a minimum and maximum spacing between sample 
points while the analysis in (5) assumes the sample points 
are uniformly random without any constraints. The constraints 
in (4) can thus be described by an RLL sequence made up 
of 0's and l's with l's representing sampling points. We 
feel this interpretation of the limitations in [4] can help us 
mathematically analyze the architecture in [4], but a detailed 
investigation of this is beyond the scope of this paper. 

D. Organization and Notation 

The remainder of the paper is organized as follows. We 
first provide some background on the RD in Section [II] and 
then explain the challenges encountered by introducing RLL 
sequences into the RD architecture in Secti on [III] We then 
present our main theoretical results in Section |IIl| and two ex- 
amples of constrained sequences, one with bad results (Section 
|TV] ) and one with good results (Section [V]), to illustrate the 
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effectiveness of our analysis. Finally, in Sections VI and VII 
we present numerical simulations to offer some verification of 
the theoretical results. 

In the following we denote matrices with upper case roman 
letters and vectors with lower case roman letters. Scalars 
are denoted with italic lower case letters. We write * for 
the conjugate transpose of a matrix, vector, or scalar. We 
reserve the letters C and c in roman font to denote universal 
constants that could change values at each instance. For a 
matrix, A|q x q denotes the principal submatrix of A created 
from the columns/rows given in ft. We also use || • || for 
the spectral norm of a matrix and || • || max for the maximum 
absolute entry of a matrix. For a random variable B, let E[B] 
be the expectation and WB = (E|B| p ) 1 / p . Let P{-} denote 
the probability of an event. The short-hand j ~ r means 
(r - 1)W/R < j < rW/R for some W and R such that 
R divides W. 

II. Background: The Random Demodulator 

We start with a brief review of the RD architecture and 
highlight the key components that allow sampling of sparse, 
bandlimited signals and refer the reader to |6] for a thorough 
overview. To start, the RD takes samples at a sub-Nyquist rate 
R while retaining the ability to reconstruct signals that are 
periodic, (two-sided) bandlimited to W Hz, and completely 
described by a total of S <C W tones. These conditions 
describe a large class of wide-band analog signals comprised 
of frequencies that are small in number relative to the total 
bandwidth but are at unknown locations. 

Formally, the input signal to a RD takes the following 
parametric form 



f(t) = J2 a w e-", *e[0,l) 



(1) 



where ft C {0,±1, ±W/2- 1, W/2gis a set of S integer- 
valued frequencies and {a u : uj G ft} is a set of complex- 
valued amplitudes. Fig. [T] illustrates the actions performed by 
the RD. The input f(t) is first multiplied by 



w-i 



Pm(t) 



n=0 



[w w J 



(*), 



where the discrete-time modulating sequence e = [e n ] is a 
Rademacher sequence, a random sequence of independent en- 
tries taking values ±1 equally likely. Next, the continuous-time 
product f(t)-p m (t) is low-pass filtered using an "integrate and 
dump" filter[j Finally, samples are taken at the output of the 
low-pass filter at a rate of R <C W to obtain y[n). 

A. Matrix Representation of the Random Demodulator 

One of the major contributions of |6] is expressing the 
actions of the RD on a continuous -time, sparse, and bandlim- 
ited signal f(t) in terms of the actions of an R x W matrix 

3 We assume W is even. An appropriate change of the set fl would cover 
the case of W odd. 

4 It can be easily shown that the frequency response of this filter tapers off 
at high frequencies. Hence, it is a low-pass filter. 



$rd on a vector a e C w that has only S nonzero entries. 
Specifically, let x G denote a Nyquist-sampled version 
of the continuous-time input signal f(t) so that x n = /(^), 
n = 0, • • • , W — 1. It is then easy to conclude from ([]]) that 
x can be written as x = Fa, where the matrix 



1 



o/W 



(n,uj) 



denotes a (unitary) discrete Fourier transform matrix and a G 
has only S nonzero entries corresponding to the ampli- 
tudes, ciu, of the nonzero frequencies in f(t). Next, the effect 
of multiplying f(t) with p m (t) in continuous-time is equiva- 
lent in the discrete-time Shannon-Nyquist world to multiply- 
ing a W x W diagonal matrix D = diag(£o?£i> ' * ' i e w-i) 
with x = Fa. Finally, the effect of the integrating filter on 
f(t) • Pm(t) in the discrete-time Shannon-Nyquist setup is 
equivalent to multiplying an Rx W matrix H, which has W/R 
consecutive ones starting at position rW/R + 1 in the r th row 
of H and zeros elsewhere, with DFa|^] An example of H for 
R = 3 and W = 9 is 



H 



1 1 1 



1 1 1 



1 1 1 



The RD collects R samples per second, and therefore, the R 
samples collected over 1 second at the output of the RD can be 
collected into a vector y G C R . It follows from the preceding 
discussion that y = HDFa = <£rd ■ ol, where we have the 
complex-valued random demodulator matrix <I>rd = HDF. 

B. Signal Recovery 

Given the discrete-time representation y = <I>rd-«, re- 
covering the continuous-time signal f(t) described in ([T]) is 
equivalent to recovering the S- sparse vector a from y. In this 
regard, the primary objective of the RD is to guarantee that a 
can be recovered from y even when the sampling rate R is far 
below the Nyquist rate W. Recent theoretical developments 
in the area of CS provide us with greedy as well as convex 
optimization-based methods that are guaranteed to recover 
a (or a good approximation of a) from y (possibly in the 
presence of noise) as long as the sensing matrix $rd satisfies 
certain geometrical properties (XT). Tropp et al. (6) uses two 
properties from the CS literature to analyze the sensing matrix. 
The first is the coherence. The coherence fi of a matrix 

is the largest inner product between its (scaled to unit- 
norm) columns 0^: fi = max w ^ Q | 0a) I- Many recovery 
algorithms rely on the coherence of the sensing matrix being 
sufficiently small (27). The analysis in (6| in this regard also 
relies on the input signals conforming to a random signal 
model: given the signal model ([l}, the index ft is a set of S 
tones drawn uniformly at random from the set of W possible 
tones. Further, the coefficients a w are drawn uniformly at 
random from the complex unit circle. Under this signal model, 
^-sparse signals are recoverable with high probability if the 
sampling rate scales as R > C[S log W + log 3 W] |6J. 

5 Throughout this paper we assume that R divides W; otherwise, a slight 
modification can be made to H as discussed in |6]. 
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The second property used in (6) is the Restricted Isometry 
Property (RIP) (2§J. 

Definition 1. The RIP of order S with restricted isometry 
constant 5 s G (0, 1) is satisfied for a matrix with unit-norm 
columns if 

(l-«5 s )||x|||<||$x||I<(l + «5 s )||x||| 

or equivalently 

hi - * s (2) 

for every x with ||x|| < Here, ||x|| counts the number of 
non-zero entries in x. 

Note that RIP-based analysis tends to be stronger than the 
coherence-based analysis because the RIP provides a better 
handle on worst-case performance as well as on performance 
in the presence of noise (29[ Theorem 1]. It also provides 
stable recovery even if the signal is not exactly sparse, but 
is well-described by a sparse signal (so-called compressible 
signals) (29] Theorem 2]. We will therefore focus only on 
proving the RIP with the understanding that RIP automatically 
implies stable and robust recovery (see (T7J and the references 
therein for a more extensive list of results). 

In this paper, we use the "triple-bar" norm of | 6 ] to describe 
the RIP condition. Given a matrix A and set of indices Q C 
{0, . . . , W — 1}, the triple-bar norm captures the least upper 
bound on the spectral norm of any S x S principal submatrix 
of A: 

|||A|||= sup ||A| nxn ||. 0) 

\Q\<S 

It can easily be checked that 1 1 1 • 1 1 1 is a norm and that ^ is 
satisfied if and only if |||$*$ — I||| < Ss- 

The main result of |6 ] in this respect is that the RD matrix 
satisfies the RIP of order S as long as the sampling rate R 
scales as#>CSlog 6 W. 

III. Constrained Random Demodulator 

As described in the previous section, the RD uses a random 
waveform generated from a Rademacher sequence with transi- 
tion density of \ (on average, one transition every 2 Nyquist 
periods). However, limitations of analog circuits imply that 
each transition in the waveform results in a loss of energy 
compared to a waveform with ideal square pulses |7|. RLL 
sequences are an attractive way to generate waveforms with 
a reduced transition density of Additionally, we will 

later show that RLL sequences can also lead to superior 
performance for specific classes of input signals. We remind 
the reader that if an RLL sequence is used we call the 
resulting system a Constrained Random Demodulator (CRD) 
and denote the corresponding system matrix as $crd = HDF 
where D contains an RLL sequence e instead of a Rademacher 
sequence. The properties of the Rademacher sequence, in 
particular independence, are central to the analysis of the RD 
in (6) ; we therefore must carefully consider the impact of using 
an RLL sequence that is inherently correlated. 



The strength of (6) is that it shows that the RD matrix 
satisfies the RIP with high probability, allowing strong guar- 
antees to be made about the recovery of signals sampled 
with the RD. The RIP is satisfied primarily because of three 
properties of the RD matrix: (i) the Gram matrix averages 
(over realizations of the modulating sequence) to the identity 
matrix, (ii) the rows are statistically independent, and (iii) the 
entries are uniformly bounded. All three properties rely on the 
independence of the modulating sequence. In the CRD, we 
have to deal with dependence across e. Nevertheless, the last 
two properties are handled relatively easily. Specifically, if we 
can find some distance between entries in e such that any two 
entries, when separated by this distance, are independent, then 
we can partition the rows of $crd (or entries of e) into sets of 
independent rows (entries). We can then find bounds similar 
to those found in (6) for these sets and take a union bound 
over all the sets to obtain the desired properties. 

A. Maximum Dependence Distance 

To make the previous discussion more concrete, recall that 
the (r, uo) entry of <E>crd is 

Prut =^2£jfjLJ- (4) 

If e is an independent sequence, then each cp ruj is a sum 
of independent random variables, and each row of $crd is 
independent. However, if we use a correlated sequence then 
the rows may not be independent, and it is important to know 
the extent of the dependence within the sequence. 

Definition 2. The Maximum Dependence Distance (MDD), 
I, for a modulating sequence e is the smallest t such that 
ElejEj+k] = for all j and \k\ > if\ 

Now, if we define p = 1)1 < (£ - 1), then any 

two rows of <£crd separated by at least p + 1 rows will be 
independent. Given p and £, we can now partition the rows of 
$crd into p + 1 subsets where the rows in each subset are 
independent]^] Using this partitioning scheme, we can proceed 
with the analysis of independent rows and finally take a union 
bound over all subsets. Using I, we can similarly show that 
each entry of $crd is uniformly bounded. The details are in 
Appendices [A] and [B] 

B. The Gram Matrix 

Analysis of the Gram matrix of $crd is a little more 
involved. To start, denote the columns of $crd by ^ and note 
that the (r, uj) entry of $crd is given by ([?]). The Gram matrix 
is a tabulation of the inner products between the columns and 
(as calculated in [6]) is given by ^cpj^crd = I + X. Here, 
the (a,oj) entry of X is the sum 

Xaoj = ^SjSkVjkfJafkoj (5) 

6 Note that this is a correlation distance, but that for the bipolar sequences 
of our concern, uncorrected implies independent. 

7 We assume for convenience that p + 1 divides R. This can be readily 
relaxed by adjusting the size of the last subset. 
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where [so, • • • , e^-i] = s is the modulating sequence, rjjk = 
(hj,hk) with hj being the jth column of H, and fj a is the 
(j,a) entry of the (unitary) Fourier matrix F. Expanding rjjk, 
we have that 



1, 



rr < j,k< ¥(r + l) 



0, otherwise 



(6) 



for each r = 0, ••• , i? — 1. We see that rjjk acts as a 
'windowing' function in the sum. In expectation, the Gram 
matrix is E[$£ rd <I>crd] = I + E[X] = I + A where we have 
identified A = E[X] with entries 



^2Vjkfjafk U ^[£j£k]' 

3^k 



(7) 



Note that A is completely determined by the autocorrelation 
of e. If an independent e is used (such as for the RD) then 
ElsjSk} = for j + k, A = 0, and E[^ D $ RD ] = I. In [§, 
this relation is taken to mean that the columns of <£rd form an 
orthonormal system in expectation. This can of course never 
be true if R < W, and the RIP is shown by bounding the 
deviation from this expectation in 1 1 1 • 1 1 1 . 

If e has non-zero correlations, however, then A does not 
disappear and the expectation of the Gram matrix is not the 
identity matrix. To establish the RIP in this case, we still need 
to bound the deviation of the Gram matrix from the identity 
matrix, but now we must also contend with A. Nevertheless, 
if this matrix is small in 1 1 1 • 1 1 1 then our task is easier. Since 
the autocorrelation of e determines A, we want to choose a 
e that produces small 1 1 1 A 1 1 1 . In particular, recall that the RIP 
of order S is satisfied if 



1$ 



CRD 



$CRD - I||| < Ss- 



(8) 



Expressing I = E[$c RD <1>crd] - A, the left-hand side of ^ 
can be bounded as 



I^CRD^CRD 



■II 



= IH^CRD^CRD 
< IH^CRD^CRD 



E[$£rd$crd] 
e [$crd$crd]| 



(9) 



due to the triangle inequality. Therefore, to show the RIP we 
must upper bound the two terms in ([9}. The first term will 
be bounded using an argument very similar to that used in 
1 6] but modified to deal with the correlations in e. Since the 
second term, |||A|||, is determined by the autocorrelation of 
e, we will provide a bound on |||A||| that directly relates to 
the choice of e. 



C. Main Results 

The preceding discussion on £ and A enables us to make 
a statement about the RIP of a CRD that uses a correlated 
modulating sequence. 

Theorem 1 (RIP for the CRD). Let $ C rd be an R x W 
CRD matrix using a modulating sequence with maximum 
dependence distance £ and A (as defined by ^). Next, pick 



5,5' G (0,1) such that 5' < S — |||A||| and suppose that R 
divides W, £ divides ^ Q and R satisfies 



R>£ 6 5 f - 2 -C-Slog*(W) 



(10) 



where C is a positive constant. Then with probability 1 — 
0(W~ 1 ) the CRD matrix $crd satisfies the RIP of order S 
with constant Ss < 5. 

The proof is provided in Appendix [A] As with the RD, the 
sampling rate R must scale linearly with the sparsity S of the 
input signal and (poly logarithmically with the bandwidth W. 
The sampling rate, however, also depends on the maximum 
correlation distance £ and on the matrix A. Both of these are 
determined by the choice of e. If we choose an independent 
(i.e., unconstrained) e, then £ = 1, A = and we get back the 
RD result of | 6 |. For a constrained e, we must restrict ourselves 
to sequences such that A satisfies |||A||| < 1. Obviously we 
would like to find sequences for which both £ and 1 1 1 A 1 1 1 are 
as small as possible. With this criterion in mind, in the next 
two sections we will examine two classes of sequences to see 
how well they work in the CRD framework. 

In addition to the RIP, we also use the coherence of the 
sensing matrix to provide results for the random signal model 
described in Section [n] In the sequel, we use a matrix from 
(30) to capture the dependence in e. For a sequence e, define 
the triangular matrix T of "mixing coefficients" as T = {7^ } 
with 

'0, 
1, 

»( £ . = + i| £i = _i) 

P(e i = +l|e i = +l) 



i > j 
i = j 



% < j. 



Theorem 2 (Recovery under the random signal model). Sup- 
pose that the sampling rate satisfies 



R>C£ 2 [S\ogW + \og 3 W] 



(11) 



for some positive constant C, and that R divides W and 
divides Also suppose that W satisfies 



log 2 W 



< 



C 



(12) 



'w ~ lev^iJliril 2 " 

Now, let a be a vector with S non-zero components drawn 



according to the random signal model in Section II-B and 
let ^>crd be an R x W CRD matrix using a stationary 
modulating sequence with maximum dependence distance £. 
Let y = $crd- ct be the samples collected by the CRD. The 
solution to the convex program 

a = argmin ||v||i subject to $crdv = y (13) 

V 

satisfies a = a with probability 1 — G(W~ l ). 

The proof is given in Appendix [B] The bounds offered here 
are similar to those in [ 6 ] with the rate scaling linearly with the 
sparsity S and logarithmically with the bandwidth W but more 
tightly constrained by the factor of £ 2 and the extra constraint 
on W. 

8 Throughout this paper, these requirements can be readily relaxed through 
meticulous accounting of terms in the analysis. 
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Because the choice of modulating sequence plays such a 
pivotal role in our analysis of the CRD, a natural question 
is what types of sequences offer good performance and what 
types offer bad performance. In the sequel, we analyze two 
different types of sequences: one for which Theorems [T] and 
[2] (approximately) apply, and one for which they do not. 
Numerical experiments in Section [Vl] then show that these 
results appear to be tight. Nevertheless, we must stress two 
points here. First, Theorems [T] and [2] are only sufficient 
conditions on the sampling rate and modulating sequence; 
a different analysis could offer stronger results. Second, the 
modulating sequences that are shown to work well numer- 
ically satisfy Theorems [T] and [2] in an approximate sense. 
From an engineering perspective, however, the approximation 
(discussed in Sec. [V]) is well justified and validated further by 
the numerical experiments. 

IV. Repetition-coded Sequences 

We begin by analyzing sequences that satisfy the RLL 
constraints and have a small value of £ but have a large 1 1 1 A 1 1 1 
and do not satisfy Theorems [T] or [2] 

Definition 3. A repetition-coded sequence (RCS) is generated 
from a Rademacher sequence by repeating each element d 
times. Let the repetition-coded sequence be denoted as €rcs = 



w 

d+l 



1 be a 



[e ,...,ew-i] and let [s {d+1)n ], < n < 
Rademacher sequence. We then require for 1 < i < d and 
each n that 

£(d+l)n = £(d+l)n+i- (14) 

Such a sequence switches at a rate of W/(d + 1). We 
discuss these sequences because they are one of the simplest 
forms of RLL sequences and also have very small MDD. 
To see this, notice that each group of repeated elements, 
[£(d+i)n+i] f° r < i < d, is completely dependent while 
independent of every other element in the sequence. The 
maximum dependence distance is £ = d + 1. 

Since the performance of the CRD also depends upon 
1 1 1 A 1 1 1 , we need to bound 1 1 1 A 1 1 1 and understand its behavior. 
To start, assume that R divides W and £ divides ^ and denote 
by £rcs an RCS. Let $rcs be a CRD matrix that uses £rcs 
as the modulating sequence: $rcs = HDF where D contains 
£rcs on its diagonal. It is convenient to rewrite the entries of 
A, given in ([7]), in this case as 

A au , = ^ Vj(j+k)fjaf(j+k)LJ^[£j£j+k]- 

To calculate 1 1 1 A 1 1 1 , it will be convenient to focus on the Gram 
matrix A = A* A, which has entries 



w 



An 



where 



E 



w , 



™F(j,w)F*(j,a), co-a=^q 
otherwise 

(15) 



Vj (j+m) fmoo ^ \Sj £j + m ] 



(16) 



and q = 0, ±1, ±{l - 1). 

We bound |||A||| by studying the entries of A. To do this, 
recall from the definition of the spectral norm that for a matrix 



A we have ||A|^ X ^|| < l|A|n x n|| for any ft' c ft. We can 
therefore lower bound 1 1 1 A 1 1 1 in this case by using ft such that 
|ft| = 1, i.e., S = 1. For S = 1, |||A||| is the square root of 
the maximum entry on the diagonal of A. Applying ([14]) to 
the autocorrelation in fl6| ), it is straightforward to show that 

d d-j d-j 



w 



I ^« 



(17) 



m——j k——j 



and that ([17]) is maximized by u = 0. This results in A ,o = d 2 
and, in the case of an RCS, for any S that 1 1 1 A| 1 1 > 1. Finally, 
in the context of Theorem [2] recall ([5]) for the case of a = uj. 
In this case, it is easy to see that \x au \ > d + 1. Theorem [2] 
on the other hand, relies on bounding ||X|| max close to (the 
details are in Appendix [B]) and this obviously cannot be done 
for an RCS. 

We see that Theorems [T] and [2] do not hold for $rcs- Al- 
though we do not have converses, we demonstrate the tightness 
of our theory for an RCS through numerical experiments. 
For this, we calculate the minimum and maximum singular 
values of the submatrices over an ensemble of matrices $rcs 
generated using an RCS with d = 1. The submatrices are 
chosen by picking S = 10 columns at random from $rcs- 
The results are presented in Fig. |3(a)| where we see the 



minimum singular values are often at or very near zero for 
some values of R, indicating the RIP is either not satisfied 
or barely satisfied with an extremely small isometry constant. 
Further, we show through numerical experiments in Section VI 
that reconstruction performance is in general poor for $rcs- 

V. Wide-Sense Stationary RLL Sequences 
We have seen in the previous section that $rcs does not 



satisfy the requirements for Theorems 1 or 2; Fig. 3(a) offers 
further evidence that $rcs does not satisfy the RIR We 
therefore do not expect it to perform well in the sampling 
and reconstruction of sparse signals. In this section, we show 
that a different class of RLL sequences (9j, although more 
complicated than an RCS, produce measurement matrices with 
better conditioned submatrices and perform much better in the 
sampling and reconstruction of frequency-sparse signals. 

We begin by examining the RIP for a modulating sequence, 
e, that is wide- sense stationary with autocorrelation function 
R £ (m) = E[£j£j +m ]. We assume the maximum dependence 
distance is £, so R £ (m) = for \m\ > £. Under these as- 
sumptions, we want to upper bound 1 1 1 A 1 1 1 . It will be easiest to 
focus on the Gram matrix ( [T5] ). In this case, we can also rewrite 
{[6]) in terms of R £ {m)\ F(j,u) = Em^o Vj(j+m)fmouRs(m) 
which we refer to as the "windowed" spectrum because 
Vj(j+m) can be viewed as a "windowing" operation on R £ (m). 
From ([6]), we see that the width of the window is W/R, which 
will be quite large as W increases (and R scales as in ([TQ|). 
F(j,uS) also looks very much like the power spectrum of e\ 
F e\u) = Em Re(m)e~^ muj . Note that F £ (uj) is real-valued. 
The significant differences in F{j,uS) are the exclusion of 
m = in the sum, a scaling by from f muj , and the 

windowing by ^( J+m ). If W/R ^> £ then the windowing has 
negligible effect in F(j,u) because R £ (m) = for \m\ > £; 
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Singular Values for RLL seqs:R,W=1024,S=10,d=1 



1:1 i 



50 100 150 200 250 300 350 400 450 500 
R 

(a) The singular values very near to zero represent poor 
conditioning of the submatrices of <E>rcs- 

Singular Values for RLL seqs:R,W=1024,S=10,d=1 



i I I: 



50 100 150 200 250 300 350 400 450 500 
R 

(b) The singular values are bounded away from and 2 
indicating good conditioning of the submatrices of <E>mrs- 

Fig. 3. The minimum and maximum singular values of submatrices with 10 
randomly chosen columns averaged over 1000 realizations of the measurement 
matrix. The error bars represent 2 standard deviations above and below the 
average value. 



F(j, uj) and ( [T5] ) both simplify greatly in this case. To see this, 
first notice that because e is a bipolar sequence R £ (0) = 1, 
and F £ (uj) = ^ m/0 R £ (m)e-^ muJ + 1 where 



^2 Re{m)e~ 

m/0 



Feifj) ~ 1. 



(18) 



We call F £ (uj) the reduced spectrum of e. Under the assump- 
tion that W/R > £, F(j, uj) reduces to W~ 1/2 F £ (uj) for all j 
except j satisfying \rW/R + j\ < t for r = 0, • • • ,R—1 (all 
but a fraction 2i^). This fraction becomes increasingly small 
as W grows. In this case, the entries of A are approximately 



w-i 



= 5 au F e (a)'F e (u>) 



where S auj is the Kronecker delta. In words, A is approxi- 
mately a diagonal matrix with the square of the reduced spec- 
trum on the diagonal: A « diag[(F e (u;)) 2 ], and the eigenvalues 
of A are approximately (F £ (uj)) 2 . Consequently, the singular 
values of A are approximately \F £ (uj)\. We therefore have 
||A|| w max w \F £ (uj)\. Now, the spectral norm of a submatrix 
is upper bounded by the spectral norm of the matrix, so we 




••• + (^^) 

Fig. 4. State diagram of the Markov chain generating an MRS (see Definition 
|4j. The transition probabilities are symmetric in the sense that P(i+k)(j+k) = 
Pij where the sum is taken modulo 2k. The top half outputs the symbol +1 
while the bottom half outputs -1. 




Fig. 5. Log-magnitude plot of the autocorrelation of an MRS. The 
autocorrelation experiences geometric decay as m — > 00. The rate of decay 
is primarily dependent on d. 



finally obtain 



IAIN < MAI 



max|F £ (cj)|. 



(19) 



We now have a way to estimate whether or not a stationary e 
is well- suited for use within the CRD. A stationary e whose 
spectrum is bounded within (0, 2) is good; one with F £ (uu) = 1 
Vcj is best. 

We now present some examples to make this discussion 
clearer. First, consider an independent (unconstrained) e, such 
as the one used in the RD. In this case, F £ (uj) = 1 and 
F £ (uj) = Vco>. The Gram matrix exactly disappears (A = 0) 
and A = confirming our previous discussion. Next, we 
consider the RLL sequences described in (8) and (9). To un- 
derstand how well these sequences will work in the CRD, we 
need to calculate the power spectrum of sequences generated 
from the Markov chain in Fig. [4] 

A. Power Spectrum of Markov Chain RLL Sequences 

To begin, we explicitly describe the RLL sequences in (9). 

Definition 4. We call a (d, fc) -constrained RLL sequence that 
is generated from the Markov chain whose state diagram is 
found in Fig. [4] a Markov- generated RLL Sequence (MRS). 
Denote such a sequence as £mrs = [£or" : e w-i] with 
Ek £ {+1,-1}. The transition probabilities are defined by the 
matrix P = [p^] where p^ is the probability of transitioning 
from state i to state j. The also satisfy p^+m^+m = p^ 
where the sum is modulo 2k. P is of course a stochastic matrix 
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with rows summing to 1. The average of the symbols output 
from each state i are collected in the vector b = {bi}. The 
stationary distribution of the states is denoted by tt = [71^] and 
satisfies tt t = tt t P. 

Having defined these MRS, we have from [31] that their 
autocorrelation function is 



Rs(m) 



a T P m b 



where a T = b T - diag[7ri, • • • , 7^] and R £ (—m) = R £ (m). 
To understand the performance of an MRS within the CRD, 
we need to understand the behavior of R £ (m) as m increases. 
Since P is a stochastic matrix, we can make use of the theory 
of non-negative matrices to understand how R £ (m) behaves. 
First note that b is orthogonal to w, where w = [1, 1, • • • , 1] T , 
and that a T b = 1. Since P is a stochastic matrix, its 
second largest eigenvalue A2 satisfies A2 < 1. Making use 
of (32j Theorem 8.5.1], we can bound the autocorrelation (in 
magnitude) as 



\Re(m)\ = |a T P m b| < AJp. 



(20) 



We see that \R £ (m)\ experiences geometric decay, at a 
rate determined by A2. This is confirmed in Fig. [5] where 
101og 10 \R £ (m) \ is plotted for several pairs (d, k). Notice that 
the rate of decay (in magnitude) is smaller for larger values of 
d and larger for larger values of k, and the curve is roughly 
the same for k = 20 and k = 00. These facts can be directly 
tied to the eigenvalues of P in each case. 

To evaluate the performance of an MRS within the CRD, we 
must evaluate the MDD and the matrix A. Looking first at the 
MDD, we use ([20} and the fact that A 2 < 1 to establish that 
linv^oo \R £ (m)\ = and, hence, for any £ > 0, \R £ (m)\ < £ 
for all rn > M where M = M(£) < 00. Though we cannot 
guarantee that an MRS becomes completely uncorrelated for 
a finite M, we can make £ as small as we want so that the 
sequence is nearly uncorrelated for large enough M. In this 
case, we can take the MDD to be £ « M(£) for some small 
£. In other words, an MRS satisfies the setting of Theorem 
[T] in an approximate sense. We believe this is justified from 
an engineering perspective because the correlation can be 
made very small; the numerical experiments in Section VI 
add further justification to this. 

Next, we estimate 1 1 1 A 1 1 1 from the reduced spectrum of the 
MRS. Using ([19]) and t « M(£) from above, we have that 
|||A||| < ||A|| « max^ \F £ (u)\ where F £ (u) is the reduced 
spectrum. We emphasize again that £ can be made as small 
as we like at the expense of a larger i. Consequently, it can 
be argued that an MRS that satisfies 

max \F £ (uj)\ < 1 

leads to a matrix <E>mrs that approximately satisfies the RIP 
by virtue of Theorem [T] 

Turning to Theorem [5J we must show that ||r|| is bounded 
independent of W. It is easy to show that for i < j, 




Frequency Index: co / W 

(a) MRS with d = 1 and k = 20 



CD 

"O 1-2 



Frequency Index: co / W 

(b) Rademacher seqeunce 




1%j = V\Re(j-i)\/2 < 



Frequency Index: co / W 

(c) RCS with d = 1 

Fig. 6. Power spectrum of an MRS, a Rademacher sequence, and an RCS. 
For the MRS, signals in region 1 get more energy in the measurements than 
signals in region 2. The Rademacher sequence is ideal for sampling any 
frequency sparse signal. An RCS is not well suited for sampling signals with 
any high frequency content. 



for an MRS. It is then also straightforward to show (see, e.g., 
the discussion after [30, Proposition 1]) that 

||r||<iA/2(i-A^ /2 ). 

Since | |T| | is independent of W, we can make W large enough 
so that fl2] ) is satisfied and Theorem [2] is approximately 
satisfied. 

Our argument for the use of an MRS within the CRD 
makes use of some approximations. To demonstrate the va- 
lidity of these approximations, we consider an MRS with 
(d,k) = (1,20). The spectrum of this MRS is shown in 
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(a) Probability of successful reconstruction over 1000 in- 
stances of <E>rcs with d = 1 
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(b) Probability of successful reconstruction over 1000 in- 
stances of <E>mrs with d = 1 and k = 20 

Fig. 7. The RCS does not offer good performance and, in fact, fails quite 
often. The MRS offers comparable performance to the Rademacher sequences 
of the RD. 



Fig. 6(a) From this figure, we see that max w \F £ (uj)\ w 0.9 
corresponding to uj = ±0.5. Our theory, therefore, predicts 
that the matrix $crd in this case satisfies the RIP. To verify 
this, we calculate the average minimum and maximum singular 
values of the submatrices of $crd and present the results 



in Fig. 3(b) for submatrices containing 10 columns. We see 
that as R decreases, the singular values approach and 2 but 
remain bounded away from them. In Section [VI] we carry out 
numerical reconstruction experiments to further validate our 
theory. 

VI. Random Demodulator vs. Constrained 
Random Demodulator: Numerical Results 

In this section we numerically contrast the performance of 
the RD with that of the CRD. In the case of the CRD, we focus 
on measurement matrices built using the RCS and MRS. The 
results here are obtained using the YALL1 software package, 
an ti -solver using alternating direction algorithms (33j. We 
first examine the use of an RCS and show that a CRD using 
these sequences gives unsatisfactory results. 



Recall that we have argued in Section [IV] that $crd using 
an RCS does not satisfy the RIR Consequently, if we sample 
a sparse signal using such a measurement matrix and attempt 
to reconstruct it, we expect to get poor results. This is indeed 



the case in our numerical experiments as shown in Fig. 7(a) 
To produce these results, we hold the sampling rate constant 
at R = 50 and vary the bandwidth W. It is particularly 
noteworthy that sampling and reconstruction fail most of the 
time at W = 100 and W = 200. Note that the RCS performs 
relatively better at W = 150, owing to the splitting of some 
repeated entries of the RCS between successive rows of the 
composite matrix HD. 

We then examine sampling with a CRD that uses an MRS 
with d = 1 and k = 20 and show that it produces results 
similar to those for the RD using a Rademacher sequence. 
Recall that we have argued in Section [V| the usefulness of 
RLL sequences generated from the Markov chain of Fig. [4] 
in the context of the CRD. Fig. 7(b) validates this assertion 
and shows the empirical probability of reconstruction if we 
sample sparse signals with $crd that uses these sequences. 
The baseline for comparison is of course the RD. The figure 
shows that the performance using an MRS is very similar to 
the performance using the Rademacher sequences of the RD. 
In fact, the CRD allows us to tradeoff between sparsity, band- 
width, and recovery success. In particular, if we concentrate on 
the RD curve at W = 250 and the CRD curve at W = 300, 
we see that at a 90% success rate, we only pay a sparsity 
penalty of 2 13%) by using the CRD. At the same time, 
however, we have gained an advantage in bandwidth, W, of 
20%. Comparing the CRD curve at W = 300 to the RD curve 
at W = 150 we see that at a 90% success rate, we incur 
approximately a 28% sparsity penalty for a 100% increase in 
bandwidth. Other tradeoffs can be seen at different success 
rates, but it is reasonable to argue that most applications will 
operate AID converters in the high success rate regions. At 
lower success rates, the advantage is even greater for the 
CRD. While our analysis concentrates on a high success rate, 
analysis at lower success rates could prove useful for future 
work. 

VII. Knowledge Enhanced Sub-Nyquist Sampling 

In this section, we argue that the performance of a CRD 
can be enhanced by leveraging a priori knowledge about the 
signal. We notice two operations in Fig. [T] that are central 
to the functioning of the RD/CRD: the modulation by the 
random waveform and the subsequent low-pass filtering. The 
low-pass filtering operation allows the RD/CRD to operate 
at the sub-Nyquist rate R, while modulation by the random 
waveform — which smears the input signal tones across the 
spectrum, including in the low-pass region — results in a unique 
signature of each tone within the low-pass region. Theorem [T] 
states the sufficient conditions for uniqueness to hold for all 
possible input signals, and we explored in Section [V] how the 
RIP depends on the power spectrum of the random sequence. 
In addition to uniqueness of each tone's signature in the low- 
pass region, the performance of the RD/CRD depends on the 
energy smeared into the low-pass region because tones with a 
low-energy signature will be harder to recover. 

Note that the modulation by the random waveform in time is 
equivalent to a convolution in the frequency domain. There- 
fore, the power spectrum of the random waveform tells us 
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how much energy from each tone on average is smeared into 
the low-pass region (and thus collected in the measurements). 
Inspection of fL9| ) tells us the RIP depends on the worst- 
case deviation from a flat spectrum. However, if we use an 
MRS within the CRD and the input signal is statistically 
more likely to contain low frequencies, then this additional 
knowledge about the signal can be leveraged to improve 
the reconstruction averaged over many signals and random 
waveform realizations. Note that this is a different "average 
case" setup than the one in Theorem [2] Here, we impose 
a nonuniform distribution on tones in the input signal. We 
show in this setting that the CRD can perform better than 
the RD, provided the statistical distribution of the tones is 
matched to the power spectrum of the MRS, because the CRD 
in this case will on average smear and capture more energy 
from the input tones in the low-pass region of the spectrum. 
In addition to the case of possessing prior knowledge about 
the input signal distribution, the exposition in this section is 
also of interest in other scenarios. Consider, for example, a 
spectrum sensing application in which one assigns a higher 
priority of detection to some regions and a lower priority of 
detection to other regions. Similarly, consider the case where 
one possesses knowledge about colored noise or narrowband 
interference injected into the signal. In both these settings, the 
CRD can be tailored through the choice of the modulating 
waveform to perform better than either the RD, which treats 
all spectral regions the same way, or a pure passband system, 
which completely throws away information in some spectral 
regions. We term such usage of the CRD that exploits prior 
knowledge a knowledge -enhanced CRD. 

Note that somewhat similar ideas have been briefly explored 
in |T3| and (14), but without the explicit examination of 
the uniqueness of tone signatures. Recent work on model- 
based compressed sensing also attempts to leverage additional 
a priori information in the signal model (34), but the focus 
there is exclusively on the reconstruction side instead of the 
sampling side. 

A. Phase Transitions of Reconstruction Success 

To verify our understanding of the knowledge-enhanced 
CRD, we have conducted extensive numerical simulations to 
compare reconstruction performance for signals sampled by a 
CRD (using an MRS) against the RD (using a Rademacher 
sequence). Our focus here will be on two classes of input 
signals. The first class is generated by drawing a sparse set of 
tones uniformly at random; the second class is generated with 
a distribution on the tones that matches the power spectrum 



Phase Transition - RD uniform 



of an MRS with (d, k) = (1,20) (see Fig. |6(a)) . We also 
focus on two measurement matrices: the RD and the CRD 
using an MRS with (d,k) = (1,20). Recall, the RD uses 
an (unconstrained) Rademacher sequence. The sequence is 
comprised of independent terms, resulting in a flat spectrum 
(see Fig. |6(b)| ). Because the spectrum is flat, a Rademacher 
sequence will illuminate all tones equally well. That is to 
say, we expect good reconstruction performance for all sparse 
signals. On the other hand, the MRS used in the CRD has 
correlations between terms of the sequence that gives rise to 
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(a) RD with a uniform distribution on the input tones. 
Phase Transition - RD matched 




R/W 



(b) RD with a distribution on the input tones matched 
to the power spectrum of a (1, 20) RLL sequence. 
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(c) CRD with a uniform distribution on the input tones. 
Phase Transition - CRD matched 
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(d) CRD with a distribution on the input tones matched 
to the power spectrum of a (1, 20) RLL sequence. 

Fig. 8. Empirical reconstruction success as a function of S/R and R/W. 
The phase transition is the transition from to 1. 
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the spectrum in Fig. |6(a)| We see that the spectrum is close 
to 1 for the low frequencies (Region 1) and approximately 
0.1 at high frequencies (Region 2). If low-frequency tones are 
statistically more likely in the input signal, then we expect the 
CRD on average to capture more energy in the measurements 
and offer better reconstruction performance. Note, we do not 
consider the CRD using an RCS because we have shown in 



Section VI that the reconstruction performance is very poor. 
To understand why it is poor for an RCS, we can examine 
the spectra of these sequences. An RCS is not stationary 
but rather cyclo-stationary, so we calculate the spectrum by 
averaging over the cycle period. The resulting spectrum is 
shown in Fig. |6(c) for d = 1. The spectrum approaches zero 
at high frequencies, so we expect the CRD in this case to 
capture very little energy from high frequency tones in the low- 
pass region. Consequently, we also expect poor reconstruction 
performance. 

The results are displayed in Fig. [5] for the four combinations 
described above: two input signal classes and two measure- 
ment matrices. For these experiments, an RD or CRD matrix is 
generated using a random instance of the modulating sequence 
3000 times for each point (i.e., pixel) on the plot. The matrix 
is used to sample a new randomly generated ^-sparse vector, 
and reconstruction of the original vector from its samples is 
carried out using the YALL1 software package. Success is 
defined as the two vectors being equal to each other to 6 
decimal places in the norm. The results in Fig. [8] show 
that the RD performs (almost) equally well for the two input 
signal classes. On the other hand, the CRD performs much 
better for the second class of input signals. Additionally, the 
CRD suffers more at very small R/W ratios. 

B. Reconstruction in the Presence of Noise 

The phase transitions of Fig. [8] correspond to a noiseless 
setting, Here, we examine the results of reconstructing input 
signals from noisy samples, y = &a+y/p- w, where w is white 
Gaussian noise and p determines the noise powei[^] We plot the 
mean- squared error (MSE) of the reconstruction as a function 
of S/R and W/R and use the SpaRSA software package, 
which solves an £2 / £\ mixed-norm optimization termed lasso 
(35} for noisy reconstruction purposes [36[^J the results are 
shown in Fig. [9] Similar to the noiseless case, we see a sharp 
transition from low MSE to high MSE. The performance of 
the RD is also similar for each class of input signals while 
the CRD performs much better for the second class of input 
signals due to matching the prior to the power spectrum of the 
modulating sequence. 

C. Reconstruction of Signals with Non-Integral Frequencies 

The signal model ([]} assumes only integral-frequency tones. 
Real- world signals may contain non-integral frequency tones. 
These non-integral tones will 'leak' energy to several integral 
tones based on the implicit windowing operation from the 

9 The model y = &(a + y/p- w) yields similar results, but w as colored 
noise could offer interesting future work. 

10 SpaRSA is better suited for noisy reconstruction than YALL1. For the 
regularization parameter, we used 1.9\/2p log W. 



finite time assumption (t G [0, 1)). The windowing produces 
a convolution of the input tones and the window in frequency 
(37) but does not invalidate the signal model ([I]). Rather, the 
result is a scaling of the sparsity factor from S to aS, where 
a > 1 determines the extent of the leakage. Fig. [10| shows 
reconstruction results if non-integral tones in the input signal 
are allowed. Tones are drawn at random from [0, W) according 
to a distribution proportional to the spectrum in Fig. |6(a)| The 
coefficients in the input at the integral tones are determined 
by a Hamming window (in the frequency domain) centered 
at the location of the tone. Now, compare Fig. |10(a)| with 



Fig. [9(b)] (for the RD) and Fig. [10(5)1 with Fig. |9(dW(f or the 
CRD). Both plots look similar, but notice that Fig. |10| has S 
scaled by a factor of 16. This suggests that the penalty for 
considering leakage in (p} is roughly a factor of 16 in input 
signal sparsity. In the worst-case, this kind of 'mismatch' can 
seriously degrade reconstruction performance [ 38 1 . However, 
in our experiments we do not often see the worst case (a tone 
occurring halfway between two integral tones) and hence only 
see a manageable decrease in performance. 

VIII. Conclusions 

In summary, we have proposed the use of RLL sequences 
in the RD because of hardware constraints on generating 
high-fidelity, fast- switching waveforms. We have shown both 
theoretically and numerically that for a fixed switching rate, 
certain classes of RLL sequences offer an increase in the 
observable bandwidth of the system. Specifically, we showed 
that an MRS works well and an RCS does not. Insight into why 
each sequence succeeds or fails is found in the power spectrum 
of the sequence. Further, we have argued that matching the dis- 
tribution of tones in the input signal to the power spectrum of 
these RLL sequences improves performance, sometimes even 
beyond that of the RD. The most obvious future directions 
to take are a better theoretical understanding of knowledge- 
enhanced CRD and matching the modulating sequence to 
arbitrary distributions on the input tones. A more thorough 
understanding of the hardware system and the consideration 
of a more complex modulating waveform (e.g., with a pulse 
shape other than a square) would also be interesting and useful. 

Appendix A 
Restricted Isometry Property of the CRD 

To show that a CRD satisfies the RIP, we follow the proof 
technique of (6) for the RD with changes to account for 
correlations within e in our case. We begin by bounding the 
entries of $crd- 

Lemma 1. [A Componentwise Bound] Let <E>crd be an RxW 
CRD matrix, and let £ be the maximum dependence distance 
of the corresponding modulating sequence. When 2 < p < 
4 log W, we have 



E1$crd| 



< 



£■ 6 log W 



R 



and 



p<i«™>n ,>^ir. 
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Proof: We use the following Lemma of Tropp et al. (6| 
Lemma 5]. 

Lemma 2. [Bounded Entries - RD] Let <£rd be an R x W 
RD matrix. When 2 < p < 4 log W, we have 



and 



i|$RD||max > 



< 



6logW 



R 



on the context. Note that z r z* is a rank one matrix and 
that $r D $rd = Ylr=i z r z r- We now neeci me following 
proposition which is a corollary to (6j Theorems 16 and 18]. 

Proposition 1. Let &rf> be an R x W random demodulator 
matrix and z' r be an independent copy of z r . Define the random 
variable 



^RD 



$RD*$RD -^RD*$R D | 



We assume that R divides W and £ divides ^. We can Z RD ^to/ks 



write each entry of $crd as 



£ ofi 

U~ r )o 



(j~r) £ _i 



3^ 



(21) 



where [ej] is the modulating sequence, [fju] are the entries of 
the Fourier matrix F, and (j ~ r) m denotes all j such that 
j ~ r and (j mod t) = m. Note that each in ([21} is a 
Rademacher series containing W/Rl terms, and we proceed 
by applying the triangle inequality to ^2A\ \ 



ev™ = w E ^™ ^ E E? v 



(m) 
ru; * 



m=0 



ra=0 



Applying Lemma [2] to each entry in the sum, we have 



^||$crd| 



< 



£-1 

E 

m=0 



61ogW 



£R 



llogW 



R 



For the probability bound, we apply Markov's inequality. Let 

M= ||$CRD||max, then 



P{M > u} = ¥{M q > u q } < 
and choosing u = e 0,25 E 9 M, we obtain 

< e - log W 



E<?M 



M > 2 1 - 25 e - 25 



?logW 



(22) 



Finally, a numerical bound yields the desired result. ■ 
To complete the proof of Theorem [T] recall that the RIP of 
order S with constant 5s G (0, 1) holds if 

|||$CRD*$CRD - I||| < 6s. 

Using ([9]), we want to show that 

III*crd $ crd - E[$crd $ crd]||| + |||A||| < 6 S . (23) 

We have already bounded 1 1 1 A 1 1 1 in Section [V] We bound 
the first term by leveraging the results of [6] along with an 
argument similar to that used in [39] for proving the RIP of 
Toeplitz matrices. Before we continue, recall that the separa- 
tion between two rows of $crd required for independence 
between the rows is p = — 1)] < {I — 1). In what 

follows, let denote the r th row of <£rd or $crd depending 



• E^rd < (EB 2 ) 1/2 VCS\og 4 W < 
and 

• IP{^rd >S}< 8W~\ 

provided R > CS~ 2 • S\og 6 (W). Note that 



log 5 W 



< 5, 



B = max | (p roj | < 



10 log W 



R 



with probability exceeding 1 — W 1 . 

To bound the first term in ( [23] ), we proceed as follows 

ZcRD = IH^CRD^CRD ~ E^Jrd^CRD 1 1 1 




where R, 

inequality tells us that 



{(p + l)n + s}> n = 0,1,..., ^fx -I- The triangle 



^crd < ^ 



6 = 1 



^ ZfZj, ~^jZy> Zy, 



reR s 



s=l 



Each Z s is the norm of a sum of independent random 
variables, and we can apply Proposition [T] to each of them. 
Using Lemma [T] to obtain the value of B needed in Proposition 
[TJ we get 



EZ CRD <^EZ S <^ 



'C-^Slog 5 W 
R 



(P+1)1 



'C-^log 5 W 



R 



We require that EZcrd < S f for 8' G (0, 1) which is achieved 
as long as 

R> C£(p + 1) 2 (5')- 2 S log 5 W. 

We can similarly appeal to the probability bound in Proposi- 
tion Q] to obtain 

F{Z s >5'/(p+l)}<8W- 1 
ifR> C£(p+l) 2 ((5 / )" 2 5'log 6 W. Returning to <23), we have 



1$ 



CRD 



$CRD -I||| < S 



if < (5 - |||A|||), and the RIP of order 5 is satisfied with 
constant ^5 < S completing the proof of Theorem [T] 
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Appendix B 
Recovery under the Random Signal Model 

To prove Theorem [2| we must bound the coherence and 
column norms of the matrix $crd- To bound the coherence, 
we bound the maximum absolute entry of X ([5]): 



max \x a ^\ = max 

a.oo ' a, 00 



^SjSkVjkfjafkoo 



If the sequence e is not independent, but has maximum 
dependence distance I, then we need to break the sum up into 
smaller sums. Define the sets J a = {n£ + a}, < a < i — 1, 
< n < \ - 1 and Kj = {j - (£ - 1), j + 1)}. We 
now apply the triangle inequality twice to \x aoj \\ 



^SjSkrjjkfjgfkoo 
( 



E 



\ 



E £ j £ k r ljkfjafku 



. keKj 



e 3 e kVjkfjafi 

K k(£Kj 



koo 



< 



< 



E E £ J £ kVjkfjafkoo 



j keKj 

k^j 



E 

a=0 



E £ j £ kVjkfjafkoo 



jeJ a , 



E E £ j £ k r ljkfjafkoo 



j keKj 

k^j 



E 

a=0 



E £ j £ kVjkfj afkoo 



jeJ a , 

kgKj 



= e + y. m - 

a=0 

Each M a is a second-order Rademacher chaos because 
of the indices of summation, J a and ifj, and we need the 
following to deal with such a sum. 

Proposition 2. (6] Lemma 6] Suppose that R > 2\ogW. 
Let [ej] be an independent modulating sequence and define 
Xaoo = Y,j^k £ j £ kVjkfj a fkoo and X = [x au) ]. Then 



E p [||X|| max ] <8C 



logW 



R 



and 



|xiu, x >c A /^ \<w-\ 



Applying this proposition to each M a , we get 

-£-1 



logW 



E M « 



=0 



\ogW 



R 



It follows from Markov's inequality that 

fi-i 



,a=0 



J2M a >CiJ^\<W-\ 



Now we are left to deal with E. Whenever W/R > £ we 
can drop rjjk because rjjk = 1 over the index of summation. 
We can then rewrite E in this case as follows: 



E 



E E £ j £ kVjkfjafi 

j keKj,k^j 



koo 



E £ i//« ( E £k f> 

keKj.k^j 



koo 



ejfjo | exp (»• phase (f#>) ) 

j 

^2 £ jfja 



where 



E. 



= E £ kfkoo, 
keKj,ky£j 



phase(-) is the phase angle of the complex argument, and 
fj a = f* a - exp (i- phase ^Eg^)) • m sn °rt order, we will 

bound l-E^I < t2 Vj with high probability so that E can 
be bounded as 



E < 



^2 £ jfja 



•h — E\-t2 



with high probability. To bound E\ and to find t<i, we turn to 
a result to bound the norm of a random series generated from 
a Markov chain. 

Proposition 3. pOl Corollary 4] Let e = [ej] be a sequence 
of random variables generated from a Markov chain with Sj G 
{+1,-1} equally likely. Let the matrix T be the matrix defined 
in Section \III-C Let bi for 1 < i < n be arbitrary complex 
numbers and let f = ElLi £ i^i\- For ever y t > 0, 



'(|/-E[/]| >t) <exp 



8a 2 ||r|| 2 



where 



a 



En 2 - 

i=l 

We apply this proposition to both E\ and \E^\ with 



R 



h = ^/iogW-8o- 2 ||r|| 2 

and 

t 2 = ^io g w-iQ4\\r\\ 2 

respectively. As a result, 

> *i) < exp(-logW) = W' 1 

and 

P(l4 j) l >h) <W~ 2 
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Vj. Finally, we have that E < t\'ti except with probability 
2W~ X . To finish the calculation, note that 



w-i 



and 



Hence, 



*?=Ei/;. 

3=0 



2(£-l)/W. 



t v t 2 = J\ogW-%al\\Y\\ 2 -J\ogW-lQ(jl\\Y\\ 2 



logw-sv^Hrii 2 ^^ 
logW/ i 6 ^||r|| 2 . 



Finally, we have the following for the matrix X: 



|X|| max > cV^+t r t 2 < 3W~\ 



Note that lim^/^oo (log W/ yW) = 0, so we can make 
the second term as small as we like by requiring a large 
enough W. This leads us to the following statements about 

the coherence, \i = max ( ^ w | <I>lj)\ 9 and column norms of 
a CRD matrix: 

Lemma 3. [Coherence] Suppose that R> 2\og W. An RxW 
CRD matrix satisfies 

P ^ > C ^^+ ^16v^||r|| 2 ) < 3W" 1 . 



Lemma 4. [Column Norms] Suppose the sampling rate sat- 
isfies 

R>4-C£ 2 5- 2 \ogW 
and that W is large enough so that 



\og(W) 



< 



'W ~ 32^- i)||r|| 2 

Then, an R x W CRD matrix satisfies 

P{max|||0 w ||| - l| > < 

To prove recovery results, we finally use the following 
theorem. 

Theorem 3. (6| Corollary 15] Suppose that the sampling rate 
satisfies 

R> C[#logW + log 3 W). 
Draw an RxW RD matrix such that 



and 



{max III^Hl - 1| >s} < W-\ 



Let s be an S-sparse vector drawn according to the random 
signal model in Section [77] The solution s to the convex 
program ( [T3] ) satisfies s = s except with probability 8W _1 . 



Theorem [2] is the result of applying Lemmata [3] and [4] to 
Theorem [3] The increased requirement on R and the additional 
requirement on W is needed to ensure the coherence and 
column norms are satisfactory to ensure recovery. Addition- 
ally, the probability of recovery failing increases slightly to 
\2W~ l . 

Appendix C 
Uncorrelated implies Independence for 
Identically Distributed bipolar sequences 

Here we briefly show that if two entries in the modulating 
sequence are uncorrelated then they are independent for the 
sequences that arise in this paper. The sequences, denoted by 
[ej] for j = 1, W, that we are concerned with have two 
defining characteristics: (i) Ej G {+1,-1} and (ii) P{sj = 
+1} = P{£j = —1} = 1/2. The autocorrelation in this case 
can be expressed as: 

E[eje j+k ] = ¥{ £j = e j+k } - ¥{ £j ± s j+k }. 

Now, given the maximum dependence distance I we have 

P{£j = Sj+k} = ^{sj 7^ e j+k} for \k\ > £ which implies 
that 

in this case. Characteristic (ii) also tells us that 

¥{e j+k = +11^- = +1} + Pfo-+fc = +11^- = -1} = 1, 
meaning we must have that 

¥{e j+k = = +1} = ¥{e j+k = H-l^- = -1} = 1/2. 

The same argument applies to Ej+k — — 1 5 an ^ me condition 
for independence results: 

¥{e j+k = a\sj =b}= F{s j+k = a} 

for a,be {+1,-1}. 
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MSE - RD uniform Input - SNR 40 dB 




R/W 



(a) MSE plot for a RD for signals with a uniform 
distribution on the tones. 



MSE - RD matched Input - SNR 40 dB 
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(b) MSE plot for a RD for signals with a di stribu tion on 
the tones that matches the spectrum in Fig. |6(a)| 

MSE - CRD uniform Input - SNR 40 dB 
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(c) MSE plot for a CRD for signals with a uniform 
distribution on the tones. 



1 MSE - CRD matched Input - SNR 40 dB 
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(a) MSE plot for a RD for signals w ith a distribution on the 
tones that matches the spectrum in Fig. |6(a)| and with frequency 
leakage. 
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(b) MSE plot for a CRD for signals with a distribution on the 
tones that matches the spectrum in Fig. |6(a)| and with frequency 
leakage. 

Fig. 10. Reconstruction MSE (dB) as a function of S/R and R/W. The plots 
correspond to an SNR defined as in Fig. [9] In these experiments, non-integral 
frequencies are allowed and place energy at integral frequencies according to 
a Hamming window frequency response. 
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(d) MSE plot for a CRD for signals with a di stribu tion 
on the tones that matches the spectrum in Fig. |6(a)| 

Fig. 9. Reconstruction MSE (dB) plotted as a function of S/R and R/W. 
The plots correspond to an SNR 40 dB defined as the ratio of the power of 
the measurements to the noise variance. 



